Typical Cases of Annotators' Disagreement in Discourse Annotations in Prague Dependency Treebank
نویسندگان
چکیده
In this paper, we present the first results of the parallel Czech discourse annotation in the Prague Dependency Treebank 2.0. Having established an annotation scenario for capturing semantic relations crossing the sentence boundary in a discourse, and having annotated the first sections of the treebank according to these guidelines, we report now on the results of the first evaluation of these manual annotations. We give an overview of the process of the annotation itself, which we believe is to a large degree language-independent and therefore accessible to any discourse researcher. Next, we describe the inter-annotator agreement measurement, and, most importantly, we classify and analyze the most common types of annotators’ disagreement and propose solutions for the next phase of the annotation. The annotation is carried out on dependency trees (on the tectogrammatical layer), this approach is quite novel and it brings us some advantages when interpreting the syntactic structure of the discourse units.
منابع مشابه
Annotators' Certainty and Disagreements in Coreference and Bridging Annotation in Prague Dependency Treebank
In this paper, we present the results of the parallel Czech coreference and bridging annotation in the Prague Dependency Treebank 2.0. The annotation is carried out on dependency trees (on the tectogrammatical layer). We describe the inter-annotator agreement measurement, classify and analyse the most common types of annotators’ disagreement. On two selected long texts, we asked the annotators ...
متن کاملIntroducing the Prague Discourse Treebank 1.0
We present the Prague Discourse Treebank 1.0, a collection of Czech texts annotated for various discourse-related phenomena "beyond the sentence boundary". The treebank contains manual annotations of (1), discourse connectives, their arguments and senses, (2), textual coreference, and (3), bridging anaphora, all carried out on 50k sentences of the treebank. Contrary to most similar projects, th...
متن کاملAnnotation Tool for Discourse in PDT
We present a tool for annotation of se mantic intersentential discourse rela tions on the tectogrammatical layer of the Prague Dependency Treebank (PDT). We present the way of helping the annotators by several useful features implemented in the annotation tool, such as a possibility to combine surface and deep syntactic representation of sen tences during the annotation, a possibili ty to ...
متن کاملAnnotators' Agreement: The Case of Topic-Focus Articulation
The annotation of the Prague Dependency Treebank (PDT) is conceived of as a multilayered scenario that comprises also dependency representations (tectogrammatical tree structures, TGTS’s) of the underlying structure of the sentences. TGTS’s capture three basic aspects of the underlying structure of sentences: (a) the dependency tree structure, (b) the kinds of dependency syntactic relations, an...
متن کاملFrom Sentence to Discourse: Building an Annotation Scheme for Discourse Based on Prague Dependency Treebank
The present paper reports on a preparatory research for building a language corpus annotation scenario capturing the discourse relations in Czech. We primarily focus on the description of the syntactically motivated relations in discourse, basing our findings on the theoretical background of the Prague Dependency Treebank 2.0 and the Penn Discourse Treebank 2. Our aim is to revisit the present-...
متن کامل